Skip to content

Fix qwen3 30b#8

Merged
JohannesGaessler merged 2 commits intoJohannesGaessler:ggml-meta-backend-8from
gaugarg-nv:fix_qwen3_30b
Feb 25, 2026
Merged

Fix qwen3 30b#8
JohannesGaessler merged 2 commits intoJohannesGaessler:ggml-meta-backend-8from
gaugarg-nv:fix_qwen3_30b

Conversation

@gaugarg-nv
Copy link

Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation.

Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation.
@gaugarg-nv gaugarg-nv changed the base branch from master to ggml-meta-backend-8 February 24, 2026 11:36
@JohannesGaessler JohannesGaessler merged commit 87e172b into JohannesGaessler:ggml-meta-backend-8 Feb 25, 2026
JohannesGaessler pushed a commit that referenced this pull request Mar 7, 2026
* Fix crash with Qwen-30B-A3B Q4_0

Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation.

* Decide block size based on tensor quantization type
JohannesGaessler pushed a commit that referenced this pull request Mar 8, 2026
* Fix crash with Qwen-30B-A3B Q4_0

Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation.

* Decide block size based on tensor quantization type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants